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Abstract 

We describe an extension of ML with records where inheritance is given by ML generic 
polymorphism. All operations on records introduced by Wand in [Wan87] are supported, 
in particular the unrestricted extension of a field, and other operations such as renaming 
of fields are added. The solution relies on both an extension of ML, where the language 
of types is sorted and considered modulo equations [Rem90b], and on a record extension 
of types [Rem90c]. The solution is simple and modular and the type inference algorithm 
is efficient in practice. 

Introduction 

The aim of typechecking is to guarantee that well-typed programs will not produce runtime errors. 
A type error is usually due to a programmer’s mistake, and thus typechecking also helps him in 
debugging his programs. But programmers do not like writing the types of their programs by 
hand. Type inference requires as little type information as the declaration of data structures; then 
all types of programs will be automatically computed. 

Our goal is to provide type inference for labelled product data structures, commonly called 
records, allowing some inheritance between them. 

After recalling related work and defining the operations on records, we first review the solution 
for a finite (and small) set of labels, which was presented in [Rem89], then extend it to a denum- 
berable set of labels. In the last part we discuss the power and weakness of the solution, describe 
some variations, and suggest improvements. 

Why records? 

Before records, data structures were built using product types, as in ML for example. 

(" Peter", "John”, "Professor", 27, 5467567, 56478356, ("toyota", "old", 8929901)) 

With records one would write, instead: 

{name = "Peter"; lastname = "John"; job = "Professor"; age = 27; id = 5467567; 
license = 56478356; vehicle = {name = "toyota"; id = 8929901; age = "old"}} 

*Partly supported by research grant NSF IRI86-10617. 
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The latter program is definitely more readable than the former. It is more precise, too. Records can 
also help send arguments to functions or retrieve their results. More generally, in communication 
between processes records permit the naming of the different ports on which processes can exchange 
information. One nice example of this can be found in the language LCS [Ber88] which is a 
combination of the language ML and the language CCS designed by Robin Milner in 1980 [Mil80]. 
But certainly records have become more popular since we know that they can help implement 
objects with a kind of inheritance [Wan89, CM89]. 

If both variants and records were added, types of data structures could be completely inferred 
without any type declaration, whereas ML requires concrete data type declarations. This is also a 
strong motivation for having records. 

Related work 

Luca Cardelli is the first who claimed that functional languages should have record operations. 
Because he did not know how to combine records and polymorphism, he designed Amber in 1986 
as a monomorphic language. Later he designed the language FUN where bounded quantification 
was introduced. Bounded quantification is an extension of second order quantification when some 
type inclusions are allowed. This notion is essential in coding operations on records with inheritance. 
In the language QUEST, the successor of FUN [CW85], bounded quantification was extended to 
higher order. 

A lot of work has been done on the semantics of languages with inclusion, initially without 
record operations, which where incorporated later. It is only recently that a semantics of Quest 
has been proposed [LC90] . 

A slight but significant improvement of bounded quantification has been made in [CCH*89] to 
better consider recursive objects; a more general but less tractable system was studied by Pavel 
Curtis [Cur87]. Today, the interest seems the simplification rather than the enrichment of existing 
systems [LC90]. An interesting study whose goal is to remove bounded quantification is [HP90]. 

Records have also been formulated with explicit labelled conjunctive types in the language 
Forsythe [Rey88]. 

In contrast, records in implicitly typed languages have been less studied and all proposed 
extensions of ML are still very restrictive. The language Amber in 1986 did not actually make 
records more popular in functional langages, probably because it was not polymorphic. Inheritance 
in Amber was obtained by type inclusion [Car84, Car86]. Records became very attractive in 1987, 
after Mitchell Wand proposed a system in [Wan87] where inheritance was obtained from ML generic 
polymorphism. Though type inference was incomplete for this system, it remains a reference, for 
it was the first concrete proposal for extending ML with records having inheritance. Next year 
complete type inference algorithms were found for a strong restriction of this system [JM88, OB88]. 
They only allowed the extension of a record with a field that was not defined before. Then, the 
present author proposed a complete solution to Wand’s system [Rem89], but it was formalized only 
in the case of a finite set of labels (a solution was also given by Wand in 1988, but the completeness 
was obtained at the cost of a complete set of principal types and the algorithm was explosive in 
practice). Mitchell Wand revisited this approach and extended it with an “and” operation 1 but 
did not provide correctness proofs. The case of an infinite set of labels was addressed in [Rem90a], 
which we review in this article. Works have been contributed by Peter Buneman and Atsushi Ohori, 
simplifying the previous system [OB89, Oho90]. Though the solution of [Oho90] solves only the 
restricted extension, it shares with this work a reliance on a sorted extension of ML, and pushes 
some of the label constraints to the level of sorts. 

1 You may understand it as an “append” on association lists in lisp compared to the “with” operation which should 
be understood as a “cons”. 
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Operations on records 

In this subsection we show with examples what operations on records are expected and introduce 
the main constructions. We use a CAML like syntax [CH89, Wei89]. 

The example we started with already illustrates a few ideas about labels. Like variable names, 
labels do not have particular meanings. Though choosing good names (good is very subjective) 
will help in writing and reading programs. Names can, of course, be reused in different records, 
even to build fields of different types. This is illustrated in the three following examples 

let car = {name = "Toyota”, age = "old”’; id = 7866};; 

let truck = {name = ’'Blazer", id = 6587867567};; 

let person = {name = "Tim”; age = 31; id = 5656787};; 

Note that no declaration previous to the use of labels is needed. The record person is defined on 
exactly the same fields as the record car, though those fields do not have the same intuitive meaning. 
The field age holds values of different types in car and person. 

In the previous examples we built records all at once. But we can also do it step by step. A 
value driver can be defined as being a copy of the record person but with one more field vehicle filled 
with the previously defined car object. 

let driver = {person with vehicle = car};; 

Note that there is no sharing between the records person and driver. You can simply think as if the 
former were discharged into a new empty record before adding a field car to build the latter. This 
construction is called the extension of a record with a new field. In this example the field newly 
defined was not present in the record person, but that should not be a restriction. For instance, if 
our driver needs a more robust vehicle, we write 

let truck-driver = {driver with vehicle = truck};; 

As above, the operation is not a physical replacement of the vehicle field by a new value 2 . We do 
not wish any constraint between the types of the old and the new values of the vehicle field. To 
distinguish between the two kinds of extensions of a record with a new field, we will say that the 
extension is strict when the new field could not be previously defined and unrestricted otherwise. 

A more general operation than adding a field to a record is the construction of a new record 
from two previously defined ones, taking the union of their defined fields. For instance, assume 
that a car has a good engine but a rusty body and that you cannot start your truck. If you are a 
good mechanic, you could build this strange object 

let repair-truck = {car and truck};; 

and drive again. Of course, the semantics of the and construction has to be defined. One question 
which arises is what value should be assigned to fields which are in both car and truck? Usually 
when there is a conflict, i.e. a same field is defined in both records, its value would be taken from 
the last record. But you might also expect from a typechecker that it would prevent this situation 
from happening. Although the and construction is less common in the literature, probably because 
it causes more trouble, it seems a very interesting one in different respects. This is what happens 
in the language SML [HMM90] when a structure is opened and extended with another one. In the 
language LCS the visible ports of two processes run in parallel are exactly the ports visible in any 
of them. And as shown by Mitchell Wand [Wan89] multiple inheritance can be coded with an and 
construction. 

The constructions we described above are not exhaustive but are the more common ones. We 
should also mention the permutation, renaming and erasure of fields. We described how to build 

2 This operation would be ill typed if truck and car had incompatible types. 
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records, but of course we also want to read them. There is actually a unique construction for that 
purpose. 

let id x = x.id ;; 
let age x = x.age;; 

This shows that the construction which reads any specified field is a real functional value. But 
as labels are not values, there is no function which could take a label and a record as arguments 
and would read the field of the record corresponding to that label. Thus we need one extraction 
function per label, as for id and age above. Then they can be applied to different records of different 
types but all having the field we want to read. For instance 

age person, age driver;; 

They can also be passed to other functions, as in 
let car-info field = field car;; 
car-info age;; 

The testing function 
let eq x y = equal x.id y.id 

should of course accept arguments of different types provided they both have an id field of the same 
type. 

eq car truck;; 

These examples were very simple. We will answer them below, but we will also meet more tricky 
ones. 

1 A small solution when the set of labels is finite 

Though this solution will be made obsolete by the extension to a denumerable set of labels, we 
choose to present it first, since it is very simple and the extension will be based on the same ideas. 
It will also be a very decent solution in some cases when one wants only a few labels. And it 
will emphasize a method for getting more polymorphism in ML (in fact, we will not put more 
polymorphism in ML but we will make more use of it, sometimes in unexpected ways). 

We will sketch the path from Wand’s proposal to this solution, for it may be of some interest to 
describe the method which we think could be applied in other situations. As intuitions are rather 
subjective, and ours may not be yours, the section 1.1 can be skipped whenever it does not help. 

1.1 The method 

Records are partial functions from a set C of labels to the set of values. We simplify the problem 
by considering only three labels a, b and c. Records can be represented in three field boxes, once 
labels have been ordered: 


a b c 


Defining a record is the same as filling some of the fields with some values. For example, we will 
put the values 1 and true in the a and c fields and leave the b field undefined. 


1 true 
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Typeckecking means forgetting some information about values, for instance we will identify two 
different numbers and only remember them as being numbers. The structure of types usually 
reflects the structure of values, but with fewer details. It is thus natural to type records values 
with partial functions from labels to types. 


C — * types 


We first make record types total functions on labels using an explicitly undefined constant a bs 
(“absent”). 

C — » types U {a bs } 

In fact, we replace the union by the sum pre (types) + abs • We decompose record types as follows: 


C — > [1, Card (£)] — » pre (types) + a bs 


The first function is an ordering from C to the segment [1, Card (£)] and can be set once and for 
all. Thus record types can be represented only by the second component, which is a tuple of length 
Card (£) of values in pre (types) + a bs . The previous example is typed by 


1 


true 


II( pre(num) , a bs , pre (bool) ) 


A function extract reading the a field shall accept as an argument any record having the a field 
defined with a value M, and return M. The a projection of the type of the argument must be 
pre(t) if t is the type of M. We do not care whether other fields are defined or not, so their types 
may be anything, i.e. they are variables <p and The result has type a . 


extract a : II (pre (a), <p, VO — ► ot 


1.2 A formulation 


Because we want a very restricted use of pre and a bs symbols, the language of types will be a 
sorted free algebra, .F(E, V). The set C of type symbols contains at least an arrow symbol — ► and 
two symbols pre and a bs . We note g the arity function. The set fC is a pair of two sorts usual 
and field and the signature E of C is defined by: 

pre : usual => field 
a bs : field 

II : field ® . . . field => usual 

V v ' 

card(C) 

V/GC \ {pre , a bs , II}, / : usual ® . . . usual =» usual 

N N* 1 " ' 

Q(f) 

The extension of ML with sorted types is straightforward. We will not formalize it further, since 
this will be subsumed in the next section. The inference rules are the same as in ML though the 
language of types is sorted. The typing relation defined by the inference rules is still decidable and 
admits principal solutions in the usual sense. 

In this language, we may assume that the primitive environment is composed of the following 
assertions: 

null : II (a bs , . . . a bs ) 
extract : II (np 1 . . . , pre (a) ...(p t )-^a 

new a : II a -> n(v>, pre(a), 


System 11/ 

The null constant is the empty record. The extract a constant reads the a field from its argument, 
and the new a constant extends its first argument on label a with its second argument. 
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2 Extension to large records 

Though the previous solution is very simple, and perfect when there are only two or three labels 
involved, it is clearly no longer acceptable when the set of labels is getting larger. This is because 
record types axe proportional to the size of this set even the type of the null record, which has no 
field defined. When a very local use of records is needed, the number of labels may be written with 
only one digit and the solution works perfectly. But in a large system where some records are used 
globally, the number of labels will quickly be over one hundred. 

In any program the number of labels will always be finite, but with modular programming, the 
whole set of labels is not often known at the beginning (though in this case, some of the labels 
may be local to a module and solved independently). In practice, it is thus interesting to reason 
on an “open”, i.e. denumerable, set of labels. From a theoretical point of view it is the only way 
of avoiding the meta-reasoning which would show that any computation done in a system with a 
small set of labels would still be valid in a system with a larger set of labels, and that the typing 
in the latter case could be deduced from the typing in the former case. The nice solution consists 
in working in a system where all potential labels are taken into account from the beginning. 

In the first part we will illustrate the above discussion and describe the intuitions. Then we will 
formalize the solution in three steps. First we extend types with record types in a more general 
framework of sorted algebras; record types will be sorted types modulo equations. The next step 
describes an extension of ML where types are sorted taken modulo equations. Last, we apply the 
results to a special case, re-using the same encoding as for the finite case. 

2.1 An intuitive approach 

We first assume that there are only two labels a and 6. Let r be the record equal to {a = 1 ; b— true} 
and / the function which reads the a field. What happens when we apply / to r? Assuming / has 
type t -+ s and r has type r, we can apply / to r if the two types t and r are unifiable. In our 
example we have 

t : II (a: pre (a); b: <p b ), 
r:II(a: pre (num) \ b : pre (bool)), 

and s is equal to a . The unification of t and r is done field by field, their most general unifier is 

f a i— ► num 
\ <Pb ^ pre (bool) 

If we had one more label c, the types t and r would be 

t : n (a : pre (a) ; b: <p b ; c: <p c ), 
r : II (a : pre (num) ; b: pre (bool); c: abs). 

and their most general unifier 

{ a i-> num 
(fib pre (bool) 

(fc >— 5 ► abs 

We can again replay with one more label d. We would have the types 

t : n (a: pre (a); b: ip b ; c:<p c ; d: <p d ), 
r : II (a : pre (num) ; b: pre (bool); c: abs ; d: abs). 

and their most general unifier 

" a i-v num 
<Pb pre (bool) 

<p c >->■ abs 
k <Pd >->■ abs 
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Since labels c and d do not appear neither in r nor in /, it was obvious that fields c and d would 
behave the same, and that all their type components would be equal up to renaming of variables, 
i.e. isomorphic types. So we can guess the component of the most general unifier on any new field 
£ only by taking a copy of its component on the c or d field. Instead of writing the types of all the 
fields, we need only to write a template for all fields whose types are isomorphic, and the types of 
significant fields, i.e. those which are not isomorphic to the template. 

t : II (a: pre(a ) ; b: <p b ; oo : v>co), 
r : II (a : pre (num) ; b : pre(bool ) ; oo : a bs ). 

The expression II ((£ : ti)iel ; oo : Sqo) should be read as 

f u \ 

\ Si otherwise, where Si is a copy of 

But we can directly calculate the most general unifier without developing this expression, which 
will actually allow the set of labels to be infinite. We summarize the above different views in this 
figure: 


This approach is so intuitive that it seems very simple. There is a difficulty though, due to the 
sharing among different templates. Sometimes a field has to be extracted from its template, because 
it must be unified with a significant field. 

The macroscopic operation that we need is the transformation of a template t into a copy s 
which will be the type of the extracted field and another copy r to become the new template. We 
regenerate the template during an extraction mainly because of sharing. But it is also intuitive that 
once a field has been extracted the retained template should remember that and thus it cannot be 
the same. In order to keep sharing, we must extract a field step by step starting from the leaves. 

For a template variable a, the extraction consists in replacing that variable by two fresh variables 
ft and 7, more precisely by the term £ : ft ; 7. This is exactly the substitution 

a l : ft ; 7 

For a small 3 term /(a), assume that we already extracted field l from a, i.e. we have f(£ : ft ; 7), 
we now want to replace it by £ : /(a) ; /( 7). How can we do that? We simply ask it to be true, 
i.e. we assume the axiom 

7) = £: f(a); /(t) 

We do that for every symbol but II 4 . We have built a record extension of types which is described 
in the next section. 

2.2 Extending a sorted free algebra with record terms 

Because we want a general solution where the encoding of records is not specified, we will study the 
problem independently of any particular signature and focus on the construction of record types. 

3 A term of height one. 

4 We could wish to do it for the symbol II as well in order to allow the template to be composed of records itselft, 
but this will not be needed for our application and we will not study this complicated case here. 
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In this section we forget the semantic of types and simply think of them as the terms of a free 
sorted algebra. 

Let V be a set of variables, C be a set of symbols and K be a set of sorts. We note the union 

U *"• 

n > 0 

Let E be a function from C to /C+, also called a signature of C. We write f 1 o ® • • • L n - 1 => for 

E : / i— ► (to, . . . t n ). We extend the term of the the free sorted algebra ^(S, V) with record terms. 


Unsorted records terms 

We call the terms of the free unsorted algebra T(V, V) where V is the set of symbols 

cu{n}u{®*|*e£}, 

unsorted record terms . We write a: a; (3 for a@ a /?, and a: a; b : /3 ; 7 for a: a; (b: f3 ; 7 ). 


Example 1 The expressions 

II (a : pre (num) ; c: pre (bool); abs) 

and 

II (a : pre ( 6 : num ; num ) ; abs ) 

are unsorted record terms. In section 2.4 we will consider the former as a possible type for the 
record {a = 1 ; c— true} but we will not give a meaning to the latter. Unsorted record terms are 
too many. We define record terms using sorts to constraint their formation. Only a few of the 
unsorted record terms will have associated record terms. 


Record terms 

Let C f be a subset of C (by default C f is taken to be C), and /C' a subset oi 1C X 1C (by default /C' is 
taken to 1C x 1C ). 

Definition 1 Record terms are the terms of the free sorted algebra ^(E' x E", V) where 

• The set of sorts is the product KxV where V is composed of the set V of all the finite subsets 
of £, extended with a sort constant e. 

V = {e}U Vfin{£) 

• the set of constants V f is 

{f c \feC}u{f A \fe c',Ae v} u {n^ K | (t,«) e K'} u {@^ 1 1 e £, A e v,£ e c \ A} 

all symbols being distinct. 

• the signature S' x E" is the product signature of S' and E", i.e. for any symbol / in V 

f : £'x£" (<■;, 4;);g[i, P ] => (k,B) <£=> f : S / (i;) i€[1)p] =» K A / : E « (A t \- 6[1)p] => B 
where the two signatures S' and E" on V are defined by the assertions 


v/ G c, 

r =£' s(/) 

f c =>• c 

V/ € C, VA € V, 

f A : E' S(/) 

f A =► A 

Vi => k £ /C', 

IP=* K : s » i =► K 

IP^ K : E » 0 =» € 

Vi G K,VAeV,WeC\ A , 

: S / i ® i => i 

@^ A :•£" e ® A U {1} =>■ A 


□ 
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Superscript erasure 

We define a function erasure from the record terms to the unsorted record terms which remove all 
the superscripts of symbols. It is easy to show that for any unsorted term t , any sort i and any 
element A of V, there is at most one record term t! such that 

1 . t is the erasure of tf', 

2 . (t, A) is the signature of t'. 

This property allows us to define a record term by giving its erasure and signature, which we shall 
usually do. Moreover we shall not write the signature when it is implicit in the context or does not 
matter. 

Example 2 The erasure of 

usual ( pre 0 ( num «)@Jeid,« ( pre W(booi< a >)@? eW '{ a > abs <“- c >)) 
is the unsorted record term 


II ( a : pre ( num ) ; c : pre (bool ) ; abs ) 
but there is no record term whose erasure would be 

II (a : pre (b: num; num); abs) 

Definition 2 The (C',/C')-record extension of the free sorted algebra .F(E, V) over the set of labels 
C is the sorted algebra T(Y^ 1 X E", V)/E where E is a composed of 

• distributivity axioms, for all symbols / : (^«)*e[i ,p] ^ K an< ^ subset of labels A which do 

not contain a, 


= (r (<.<), 6M ) 


k * (r uw (ft) ie[1 , Pl ; 


• left commutativity axioms for all sort t and finite set of labels A which does contain labels a 
and b, 

□ 


Without the superscrit, the equations are written 
• distributivity 

/ (a : ai ; = ( a : f » f (^X'e[i,p]) 


• left commutativity 

(a: a; b: (3; 7) = (b: (3; a: a; 7 ) 

Example 3 In the term 

II (a : pre (num) ; c : pre (bool) ; abs ) 

we can replace abs by b : abs ; abs and use the left commutativity axioms to obtain the term 


II (a : pre (num) ; b : abs ; c : pre (bool) ; abs ) 
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In the term 


II(a:pre(a); <p) 

we can substitute <p by b : ^ ; c : ^ to get 

n (a : pre (a) ; b : ip h ; c : xp c ; VO 
which can be unified with the previous term field by field. 

In [Rem90c] we proved the following theorem: 

Theorem 1 Unification in the record extension £", V)/ E of any free sorted algebra ^(S, V) 

is decidable and unitary (every solvable unification problem has a principal unifier ). 


2.3 Extending the types of ML with a sorted equational theory 

In this section we consider a sorted regular theory T / E for which unification is decidable and 
unitary. Recall that a regular theory is one whose left and right hand sides of axioms always have 
the same set of variables. For any term t of T jE we write V(t) the set of its variables. 

We studied the possibility of adding a sorted equational theory over the types of ML in [Rem90a]. 
We recall here the main definitions and results. The language ML that we study is pure lambda- 
calculus extended with a LET construction. We assume a set of variables V e- 

Definition 3 The set of terms in ML, is the smallest set containing Ve and such that if x is a 
variable and M and N are terms, then so are MiV, A x.M and let x = M in N . □ 


We define a relation b t/ei or b for short, called typing judgments of the form T b M : V X • t 
where 

• T is a typing environment, i.e. a list of assertions of the form x : ^ X • s where x is an ML 
variable, X is a set of type variables and t a type, 

• M is a term, 

• X is a finite set of type variables and t is a type, and we abbreviate V0 • t by t, 

• the relation < (“is more general than”) is defined by 


V*-t < Vy -5 ^ A 


3a : X — ► V, s —e tot 

(V(t)\x)ny = <b 


by the set of inference rules (MLx/e) 


(VAR) 


VX-t = T(x) 
T\-x:VX-t 


(INST) 


T\-M:VX-t VX-t<\/y-s 

r b m :vy • s 


(GEN) 


rb m :vx ./ ynv(r) = 0 

rbM x uy -t 

r[x : t\ b M : s 


(FUN) 


T b A x.M : t s 



(APP) 


T \- M :t -± s T b N it 
Tb M N :s 


(LET) 


T\-M :VX-t T[x:VX - *] b N :s 
T b let x = M in N : s 


(EQUAL) 


r b M : t t = e s 
T \- M : s 


All but the EQUAL rules are the usual rules for ML. The EQUAL rule is necessary since the 
equality on types is modulo the equations. 

The problem of type inference for ML is stated as follows. Given a context T and a term M, 
find all the substitutions a and type schemes VA' • t such that Ta b M : VA' • t. The principal 
type property means that the previous set is either empty which is decidable, or of the form 
{a/?,V X • tfi | (3 G S} where the pair (a,V X • t), called the principal solution, is computable, and 
S denotes the set of all substitutions. 


Theorem 2 If the sorted theory T / E is regular and its unification is decidable and unitary , then 
the relation \~t/e has the principal type property . 


2.4 Record type reconstruction 

In this section we apply the two preceding theorems to extend the types of ML with records types, 
then we introduce the operations on records following the finite case. 

We start with a set of symbols C containing at least the arrow symbol — ►, and two symbols pre 
and a bs . The II symbol will be provided by the record extension. Let V be a denumerable set 
of type variables. Let /C be the set of two sorts usual and fiefd, and E the following signature of 
symbols in C over 1C: 

V/gC, / : usual usual 

pre : usual => field 
a bs : field 

We write T(E) for the free algebra T{ E, V) and TZ{E)/E the record extension T{ E' X E", V)/E as 
defined in section 2.2. 

We call ML(E) the language ML, with the typing relation It follows from theorem 2 

and the above properties that the relation b n^/E 1S decidable and has the principal type property. 

Following the finite case we assume in the language ML(E) a primitive environment composed 
of the (denumerable) set of assertions 

null : II (a bs ) 

extract : II (a : pre (a) ; (p) — ► a 

new a : II (a : (p ; > a — > II (a : pre (a) ; ip) 

System II 

It is convenient to use a smoother notation, adding the following macro-syntax facilities: 

{} = null 

{r with a= x} = new a r x 
r.a = extract a r 

{^1 — ^1 t • • • Mu — = {{^1 = 5 • • • 1 ~ ^n— l} with a n — X n )■ 

We illustrate this system by examples in the next section. 
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3 Playing with records 

In this section we first show on very simple examples how most of the operations are solved by 
this system, then we meet its limitations. Some of them find a remedy by slightly improving the 
encoding. Last we propose and discuss some further extensions. 

3.1 A demonstration of the game 

A typeckecking prototype has been implemented in the language CAML. It was used to automat- 
ically type all the examples presented here and preceded by the # character. We start with very 
simple examples and end with a short program. 

When building simple record values 

#let car = {name = ’’Toyota”; age = ’’old”; id = 7866};; 

car : {name : Pre (string); age : Pre (string); id : Pre (num); Abs} 

#let truck = {name = ’’Blazer”; id = 6587867567};; 
truck : {name : Pre (string); id : Pre (num); Abs} 

#let person = {name = ”Tim”; age = 31; id = 5656787};; 
person : {name : Pre (string); age : Pre (num); id : Pre (num); Abs} 

each field defined with a value of type t is significant with the type pre(t). Other fields are not, 
and are gathered in the template abs . We can extend a record with a new field, 

#let driver = {person with vehicle = car};; 
driver : 

{vehicle : Pre ({name : Pre (string); age : Pre (string); id : Pre (num); Abs}); 
name : Pre (string); age : Pre (num); id : Pre (num); Abs} 

whether the field was previously undefined as above, or defined as below: 

#let truck-driver = {driver with vehicle = truck};; 
truck-driver : 

{vehicle : Pre ({name : Pre (string); id : Pre (num); Abs}); name : Pre (string); 
age : Pre (num); id : Pre (num); Abs} 

But we do not provide an and construction. 

The only construction for accessing fields is the “dot” operation. 

#let age x = x.age;; 

age : {age : Pre (’a); ’p} — ► 'a 

#let id x = x.id;; 

id : {id : Pre (’a); 'p} — » ’a 

The accessed field must be defined with a value of typed ’a, so it has type pre (’a), and other fields 
may or may not be defined; they are gathered in the template= variable 'p. The return value has 
type ’a. To illustrate the plain functionality, we pass age as an argument to another function in the 
following example. 

#let car_info field = field car;; 

carJnfo : ({name : Pre (string); age : Pre (string); id : Pre (num); Abs} — ► 'a) — > ’a 

#car-info age;; 
it : string 

The function equal below takes two records both possessing an id field of the same type, but we do 
not care about the other fields. For simplicity of examples we assume a polymorphic equality equal 
on numbers. 
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#let eq x y = equal x.id y.id;; 

eq : {id : Pre ('a); 'p} — ► {id : Pre (’a); 'q} — > bool 

#eq car truck;; 
it : bool 

We will see more examples in section 3.3. Let us turn to the counter-examples. 

3.2 Limitations 

There are mainly two kinds of limitations, one is due to the encoding method, the other to ML 
generic polymorphism. 

The only source of polymorphism in record operations is generic polymorphism. Or we saw 
that in a record value, a field defined with a value of type t is typed by pre (t). Once a field is 
defined every function must see it defined. This forbids merging two records with different sets of 
defined fields. We will use the following function to shorten examples 

#let choice x y = if true then x else y;; 
choice : ’a — ► ’a — * ’a 

We then fail with 

^choice car truck;; 

Typechecking error: collision between Pre (string) and Abs 

because the age field is undefined in truck but defined in car. This is really a weakness, for the 
program 

^(choice car truck). name;; 

Typechecking error: collision between Pre (string) and Abs 

which is equivalent (but more efficient) to the program 

#choice car. name truck. name;; 
it : string 

may actually be useful. We will give a partial solution to this problem, and suggest a full but 
expensive one. 

A natural generalization of the above eq function is to abstract the field which is used for testing 
equality 

#let field.eq field x y = equal (field x) (field y);; 
field.eq : (’a — ► ? b) — ► 'a — ► 'a — > bool 

It is so general that it could test equality of other values than records. We would get the old eq 
version by applying field.eq to the function id. 

#let id.eq = field.eq id;; 

id.eq : {id : Pre (’a); ’ p} — > {id : Pre (’a); ’p} — > bool 
#id_eq car truck;; 

Typechecking error: collision between Pre (string) and Abs 

The last example failed. This is not surprising since as field is bound by a lambda in field.eq, 
its two instances have the same type and so have both arguments x and y. Though in eq the 
arguments x and y were unlinked by two different instances of id. This is nothing else but ML 
generic polymorphism restriction. We emphasize that as record polymorphism is only generic the 
restriction applies drastically to them. 
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3.3 Flexibility and Remedies 

The method for typeckecking records is very flexible. For the operation on records have not been 
fixed at the beginning, but at the very end. They are parameters which can vary in many ways. 

The easiest thing to play with is changing the types of primitives. For instance asserting that 
new a has the principal type 

new a : II (a : a bs ; ip) — > a — > II (a : pre (a) ; ip) 

will make the extension of a record with a new field possible only if the field was previously 
undefined. This slight change gives exactly the strong restriction that appears in both attempts 
to solve Wand’s system [JM88, OB88]. Weakening the type of this primitive may be interesting in 
some cases, because the restricted construction may be easier to implement, and more efficient. 

We can freely change the type of primitives, provided we will know how to implement them 
correctly, but we more generally can change the set of operations on records themselves. Since 
a defined field may not be dropped implicitly, it would be convenient to dispose of a primitive 
removing explicitly a field from a record 

forget a : II (a : <p ; ip) — ► II (a : abs ; <p), 

and add this syntactic facility 


{r without a} = forget a r. 

Our encoding also allows us to type a function which renames fields 

rename 0 ^ 6 : II (a : <p ; b: ip] x) — ► n ( a : abs ; b : ; x) 

Note that the renamed field may not be defined. In the result it is no longer accessible. A more 
primitive function would just exchange two fields 

exchange a ~ b :H (a : <p ; b: ip; x) n (a : ip ; b : <p ; x) 

Then the rename constant is simply the composition 

forget a o exchange a *~* b . 

But the flexibility is much more than that. The decidability of type inference does not depend 
of the specific signature of pre and abs type symbols. The encoding of records can be reviewed. 
And we are going to illustrate that by presenting another system for type-checking records. 

We mentioned above that one extension of the current system should allow some polymorphism 
on records values themselves. We recall the example which we failed to type 

^choice car truck;; 

Typechecking error: collision between Pre (string) and Abs 

because the age field was defined in car but undefined in truck. But we would like the result to have 
a type with abs on this field to guarantee than it will not be accessed, but common and compatible 
fields should remain accessible. The idea is that a defined field should be seen as undefined when 
needed. From the type point of view, this would require that a defined field with a value of type t 
should be typed with both pre(t) and abs . If possible not using conjunctive types [Cop80]. 

The solution is first to force abs to be of arity 1 replacing each use of abs by abs (a) where a 
is a free, and so quantified, variable. Then because we cannot write 

v ¥>•¥>(*) 
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where <p would range over a bs and pre , we make a bs and pre constant symbols by introducing an 
infix field symbol noted and a new sort flag 

pre : flag 
a bs : flag 

. : flag ® usual => field 

We now write pre .a instead of pre (a) and e.a where £ is a variable of the sort flag . The solution 
IT is defined by the following set of primitives 

null : II (a bs . a ) 

extract a : II : pre .a ; <p) — ► a 

new a : II (a : (p ; VO — ► a — ► II (a : £.a ; VO 

System II* 

It is easy to see that system II* is more general than system II, i.e. that any expression typable 
in the system II is also typable in the system II*: Replacing in a proof in II all occurrences of a bs 
by a bs . a and all occurrence of pre ( t ) by pre where a does not appear in the proof, we obtain a 
correct proof in II*. 

We retype some of the examples in the system II*. Building a record creates a polymorphic 
object, since all fields have a distinct flag variable 

#let car = {name = ’'Toyota"; age = "old" ; id = 7866};; 
car : {name : ’u. string; age : ’v. string; id : ’w.num; abs.'a} 

#let truck = {name = "Blazer"; id = 6587867567};; 
truck : {name : ’u. string; id : ’v.num; abs.’a} 

Now these two records can be merged, 

#choice car truck;; 

it : {name : ’u. string; age : abs.string; id : ’v.num; abs.’a} 

forgetting the age field in car. Note that if the presence of field age has been forgotten, its type has 
not: we always remember the types of values which have stayed in fields. Thus we fail with 

#let person = {name = "Tim"; age = 31; id = 5656787};; 
person : {name : ’u. string; age : ’v.num; id : 'w.num; abs.'a} 

#choice person car;; 

Typechecking error: collision between num and string 

This is really a failure since both records have common field name and id, which might be tested on 
later, and this example would be correct in the explicitly typed language QUEST [Car89]. If we 
add a new collection of primitives 

forget a :U(a: cp; V 7 ) — ► II (a : £.a ; V 7 )? 

then we can turn around the above failure, explicitly forgetting label age in any of the two records 
^choice {car without age} person;; 

it : {name : 'u. string; age : abs.num; id : ’v.num; abs.’a} 

#choice car {person without age};; 

it : {name : 'u. string; age : abs.string; id : 'v.num; abs.’a} 

#choice {car without age} {person without age};; 
it : {age : abs.'a; name : 'u. string; id : ’v.num; abs.'b} 
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We can now present a more realistic example which illustrate the ability to add annotations on 
data structures and of course to type the presence of these annotations. The example is run into 
the system II*, and we assume an infix addition + of type num — ► num — ► num. 

# type tree ('u) = Leaf of num 

# | Node of {left: pre.tree (’u); right: pre.tree (’u); 

# annot: ’u.num; abs.unit} 

#;; 

New constructors declared: 

Node : {left : pre.tree ( ’ u); right : pre.tree (’u); annot : ’u.num; abs.unit} — » tree (’u) 

Leaf : num — > tree (’u) 

The variable ’u indicates the presence of the annotation annot. For instance this annotation is 
absent in the structure 

#let winter = ’Node {left = ’Leaf 1; right = 'Leaf 2 
winter : tree (abs) 

The following function annotates a structure. 

#let rec annotation = 

# function 

# Leaf n — ► 'Leaf n f n 

# | Node {left = r; right = s} — ► 

# let (r f p) = annotation r in 

# let (s,q) = annotation s in 

# ’Node {left = r; right = s; annot = p-fq}, p+q ; ; 
annotation : tree ('u) — > tree (’v) * num 

let annotate x = match annotation x with y,_ — » y;; 
annotate : tree ('u) — » tree (’v) 

We use it to annotate the structure winter. 

#let spring = annotate winter;; 
spring : tree (’u) 

We will read a structure with the following function. 

#let read = 

# function 

# 'Leaf n — > n 

# | ’Node r — > r. annot;; 
read : tree (pre) — > num 

Of course, it can be applied to the value spring but not to the empty structure winter. 

#read winter;; 

Typechecking error: collision between pre and abs 

#read spring;; 
it : num 

But the function 

#let rec left = 

# function 

# 'Leaf n — ► n 

# | 'Node r — > left (r.left);; 
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left : tree ('u) — > num 

may be applied to both winter and spring. 

# left winter;; 
it : num 

#left spring;; 
it : num 


3.4 Extensions 

In this section we describe two possible extensions. Both of them have been implemented in a 
prototype, but not completely formalized yet. 

One important motivation for having records was the encoding of some object oriented features 
into them. But the usual encoding uses recursive types [Car84, Wan89]. An extension of ML 
with variant types is very easy once we have record types, following the idea of [Rem89], but the 
extension is actually interesting only if recursive types are allowed. 

Thus it would be necessary to extend the results presented here with recursive types. Unification 
on rational trees in the empty theory is well understood [Hue76, MM82]. In the case of a finite set 
of labels, the extension of Theorem 2 to rational trees is easy. The infinite case use an equational 
theory, and unification in its extension with rational trees has no decidable and unitary algorithm 
in general, even when the original has. But the specificity of the record theory let us conjecture 
that it can be extended with regular trees. 

Another extension which was sketched in [Rem89] partially solves the restrictions due to ML 
polymorphism. Because subtyping polymorphism goes through lambda abstractions, it could be 
used to solve some of the examples we failed with. ML type inference with subtyping polymorphism 
has been first studied by Mitchell in [Mit84] and later by Mishra and Fuh in [FM88, FM89]. The 
LET- case has only been treated in [Jat89]. But as for recursive types, subtyping has never been 
studied in the presence of an equational theory. Though the general case is certainly difficult, we 
conjecture that subtyping is compatible with the record theory. We present below an extension 
with subtyping in the finite case. The extension in the infinite case would be similar but it would 
depend on the previous conjecture. 

It is straight-forward to extend the results of [FM89] to deal with sorted types. It is thus 
possible to embed the language 11/ into a language with inclusion. In fact, we will start with the 
language 11/ which is the finite case solution but with the signature of the language 11*. The reason 
for that could be that this language is more powerful than II/, but a more technical reason will 
appear later. We make very little use of subtyping, for we assume only the atomic coercion 

pre C a bs , 

which says that if a field is defined, it can also be considered as undefined. We would assert the 
following types to the primitives for records: 

null : II (abs .ai, . . . abs .o/) 
extract® : II ((p 1 . . . , pre ,ol . . . <p t ) — > a 

new a : II (¥>,,...¥>,)-► a -► Hfo pre.a, 

System ![£. 

Note that if the types look the same, they are taken modulo inclusion, and are thus more polymor- 
phic. In this system, we could type 

let icLeq = fielcLeq id;; 
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with 

icLeq : {id : Pre . 'a; ’p} — ► {id : Pre .’a; 'p} — ► bool 
modulo subtyping, which allows the following 
icLeq car truck;; 

The field age is implicitly forgotten in truck by the inclusion rules. 

But we would still fail with the example 

choice person car;; 

for because we could forget the presence of fields but not their types, and there is presently a 
mismatch between num and string for the old field. 

The solution would be to use the system II / instead of 11^. But the difficulty is that the inclusion 
we need is 

pre (a) C abs 

which is not atomic. Such coercions are not allowed. Type inference with inclusion with non atomic 
coercions has not been studied yet. The type of primitives for records would be the same as in 
the system 11/ but modulo this inclusion. 


Conclusion 

We described a simple, flexible and efficient solution for extending ML with operations on records 
allowing some kind of inheritance. The solution uses an independent extension of ML with a sorted 
equational theory over types. An immediate improvement is to allow recursive types needed in 
many applications of records. 

The main limitation of our solution is ML polymorphism, but we conjecture a way of going 
around. It is not clear yet whether we would want such an extension, for it might not be worth the 
extra cost in type inference. 

This system may be used to add object oriented features, and we hope that ML will regain the 
attraction that it has been loosing to the benefit of explicitly typed languages. 
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